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ABSTRACT 


We assume that p random variables, Ypscees¥os are distributed 
according to some multivariate normal distribution (called the p 
variate normal). Methods of predicting the value of one, say, Yo 
given the values of the other pel variables are discussed. A study 
is made of the problems encountered whenever one tries to reduce the 
number of variables used to predict Yp and at the same time minimize 
loss in prediction accuracy. Modifications of the step-wise proce- 
dure of adding predictor variables one at a time are considered in 
some detail, and methods of using an automatic high speed electronic 
computer to perform the numerous calculations involved are described. 
A high speed computer program was written to generate samples from 
any specified p variate normal. 

| wish to express my sincere gratitude to Professor Jack R. 
Borsting, who in class introduced me to many of the mathematical 
concepts used in this paper, and as faculty adviser provided the 
guidance necessary to apply these concepts; and to Mrs. Bette Joe, 


for her most capable typing of this paper. 
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Chapter | 


INTRODUCTION 


The multivariate normal distribution with p variables, referred 
to here as "the p variate normal” has been found to be useful as a 
model for a wide variety of real world phenomena, This distribu. 
tion has been studied intensely in the literature and has many "nice" 
mathematical properties, 

One of the p variate normal's most useful properties is the 
fact that when q of the variables are fixed, the remaining p-q 
variables become a peq variate normal, which has the same variance- 
covariance matrix regardless of the actual fixed values of the first 
q variables. Where q equals p=! the variable whose value is not 
fixed, say Yp? becomes a conditional normal random variable whose 
variance is less than the variance of Yp when the variables 
os ar are not fixed. 

In chapter Il, methods of “predicting” \ from known fixed 
values of the other p=-I variables are described, and methods of 
measuring the accuracy of prediction in terms of variance of Yp 
are given. These methods require that the p variate normal be 
specified completely by a mean vector, U, and a variance-covariance 
(V-C) matrix, L » In chapter III, methods of approximating the 
work of chapter II using sample estimates of U and Dy are described, 
These ideas are illustrated by an example in chapter IV, 

After mastering the technique of regressing p-I variables to 


form a prediction equation for the last one, Yps we turn to the 
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problem of eliminating variables that may not be useful in predicting 
the value of Ype Variables are eliminated by removing al! reference 
to them before the prediction equation is computed. Reasons for re- 
ducing the number of variables in regression are presented in chapter 
V. tater in chapter V, the process of eliminating variables from 
regression is illustrated by an example using a specified fivervariate 
normal. 

At present, the only known way to find the "optimum set” of 
r (r= pl) variables is to compute all (Pe regressions, Obviously 
this involves extremely large numbers of computations for large 
p, so that methods involving fewer computations are normally used. 
Generally these faster methods produce "good" combinations of varie 
ables in regression but often they are not the "optimum" combination 
for the same number of variables in regression, 

Chapters VI through IX discuss methods of searching for a 
satisfactorily small set of variables in regression that will reduce 
the conditional variance of Yp to a satisfactory level. The step= 
wise procedure, described in chapter VI, provides the basic procee= 
dure under study throughout the rest of the paper. Basically, this 
procedure consists of adding variables to regression in steps. At 
each step, the variable to be added is selected because its contri- 
bution to variance reduction is greatest at this step. That this 
procedure does not always produce optimal combinations of variables 
in regression is demonstrated. 

Also in chapter VI a statistical test to be applied at each 


step when a sample is being studied is described. This test provides 


hatin, 
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a criterion for halting the step-wise process which is a function 
of sample size, ne 

In chapter VII automatic regression analysis performance by a 
high speed digital computer is discussed, Additional halting 
criteria and other improvements to the step-wise procedure are 
suggested. Halt criteria proposed by Miller [7] and Efroymson [3] 
are reviewed in light of automatic regression analysis requirements. 
A modification to the step-wise procedure reflecting differences in 
cost Of observation of variables is considered. 

In chapter VIII computer programs MV REGRESSION and MV SIM, 
written by the author, are presented. Basically, MV SIM generates 
samples of a specified size, n, from a given p variate normal with 
which MV REGRESSION performs regression analyses. MV SIM also 
computes regression parameters of the given p variate normal, the 
results of which may be used as standard for comparison purposes 
with results of regression analysis of the samples. 

In chapter IX current and proposed studies using these high 
speed computer programs are outlined, 

Appendix A describes the operation of program MV SIM in detail 
and some background on the techniques used by MV SIM to generate 
samples from specified p variate normals. 

Appendix B describes statistical tests performed by MV SIM 
on sample vectors, Z, and sample (V=C) matrices, S. Results of 
tests performed on a number of generated samples of different sizes 


of a five variate normal and an 18 variate normal are given. 
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Chapter II 


THE P VARIATE NORMAL DISTRIBUTION 


In this chapter we introduce the multivariate normal distribu- 
tion with p variables, hereinafter called the "p variate normal”, 
The basic theory associated with the p variate normal is given in 
detail by Graybill 1] and Anderson [1]. Certain theorems and 
formulas that are important for later work on regression analysis 
are given here. 

A p variate normal is completely defined by any specified 
pxl vector of means, U, and any pxp positive definite symmetric 


variance - covariance (V-C) matrix, > e Lets 


Y | | O°" Op 
refs} oue(2] oe be | 
u ee, 
"p p aT Cop! 


The joint density function of the p variate normal, Y, is given 
bys 
a 
- 1/2(Y-U) )) (Y-=U) 
(2,1) f(y sees ) = p 5 | /2 e 
Pa TT P| 


foreco < Yj =< CO, i = l,ceosPo 


The element C;.: of x is the covariance between variables y; 


ij 
and Y js and u; of U is the mean of y;. 


lf the pxl vector Y is partitioned into two subvectors such 


thats 
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Y=! |, (vectors Y, and Y, are (p-q)x! and qxI respectively, 
Yo | 
q=<p), 
and if: 


“| DY | die 
U = and > = * 
=e Le Loo 
are the corresponding partitions of U and pee, then it can be shown, 
[4] section 3.6, that the conditional distribution of the qxl vector 


Yo given the vector Y, = i (a constant vector), vain; is .the 


multivariate normal distribution with qx! mean vector 


aa 
> 
Up + Lay dy (Y)" - Up), 
and qxq V=C matrix 


ye 


From the latter matrix we see the important fact that the coe 
variance matrix of the conditional random vector vA does not de= 
pend upon the value of 1". 


We shall represent the qxq V-C matrix of vali ass 
= 
Sot 7 dee.1 * Lop ~ Lor Lay Lio 


In particular, each element, C.. | p=q 
ol,goes gV™ 


(i = peaqtl,...,p) is the conditional variance of variable y; in Yps 


» of this matrix 


iee. the variance of y; when the p-q variables in Y, are fixed. 


The element C-. in the specified V-C matrix, Diy is the variance 
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of y; in the original p variate normal distribution. That C.. is 


greater than or equal to C7, | : gro tons from formula 2,2 
o's 000393/™ 


= 
above, and the fact that a ae ae is positive definite. In 


fact, the following relationship holds (where 0 = R; = 1): 


er 


e 
LielseoesP= 


2 
q (1-R;) Cc; ° 


In this formula R is the multiple correlation coefficient between 
variable Y; and vector Yo3 see [4] section 3066 


In this paper we will consider only the case where q = I. 


Y 
Now Y = » Where Y is still pxl, Y, is (p<!)x!, and Yo is the 
Yo 
a 
variable Ype Similarly, we partition Y* = y. |» U, and » so that 
o 


elements Yos Up » and Ls become Yo? u_, and Cop respectively. 


P 


It follows from earlier discussions that the distribution of 


yplYy* is the univariate normal distribution with (scalar) mean: 


=| 
(263) =U + Loy Lis (= 4) 


“pels a depp! 


=| 
must doy Li (= Uys 


and scalar variance: 
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Let B be the (p-1!)xl vector Bo = ( Do, a . From 


formula 2.4, we can write: 


or: 
(2.5) yoly,* oe Uy 7 Ba - U)) ce =) 


1 
i 


u;) +e 


p= 


| w~ledleeeanhy xomre 


@ 
j=| 


U y,* 

where U is still | ; vi = . | » and e is a normally distri- 
é ys 
p= | p-| 


buted random variable with mean zero, The variance of e is 
the value of which is independent of the actual values 
Copel, ove pal?” ial si 
* * 
Of yy po aye |" 
We define formula 2.5 as the prediction equation for Yp associated 


with the p variate normal. Most often we shall use if in the forms 


p-| 


: d Bry," - u;). 


ok 7 *) _ 
(2.6) vol ~ @2 E(y LY, ) Up 


Now, if we know the fixed values of eral (in addition to U‘and >) 
we can use 2.6 to compute the mean of the conditional random 
variable yplY|*« A measure of the “error” involved in using the 


results of 2.6 to "predict”™ the value of Yp» when Y,* is known, is 
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By comparison, if the value of am is not 


to "predict" the 


i b e 
given by Cin. ye ope | 


known, one might use the original mean of Yp2 Up» 
value of Ype The corresponding “error” of this prediction is given 


by C-_, which is greater than on The values of the 


Pp Dol ,ecespHi® 
scalars, By: in vector Bare called partial regression coefficients. 

Suppose the computed values of some of the partial regression 
coefficients By By etceee are zero, or close fo zero, Then, 
obviously, insofar as estimating y, is concerned, one can save the 
effort and cost of observing the values of Yas Yue 

lt often happens, especially when the number of variables, 
p, is large, that some of the variables, themselves, can be pre= 
dicted rather accurately by a linear combination of other variables. 
This shows that even if none of the partial regression coefficients 
are close to zero, it may be possible to observe only a select few 
of the variables and still predict Yp nearly as accurately as when 
all of the variables are used. 

Of course, the values of the partial regression coefficients 
to be used with each variable depend upon which other variables 
are used tn combination to predict Ype Throughout this paper, any 
combination of the original pel variables that are used to predict 
& in the manner just described will be said to be "in regression”, 
The variables whose values are not to be used to predict Yp we shal | 
say are "not in regression”. 

Once a combination of variables to be tn regression have been 
chosen, a modified mean vector u/ and VeC matrix 7 Aine formed from 


the original U and ys respectively by removing the uj from U and 
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OF Fj and Cj (for all i and k) from » for each variable Yj that 
is to be "not in regression”, (If q of the p=! variables 
YypseeesYn are to be “not in regression", then U’ is (peq)x! and 
ee (p-q)x(p=q)). Thus, we see that all reference to those 
variables not in regression is completely removed and a new peq 
variate normal is defined by U’ and re from which new prediction 
equations (2.5 or 2.6) can be computed. Note that it is possible 
p=! 


to make up > Vlarie prediction equations for predicting variable 
Ja 

Yo? one for each possible combination of variables Yyseres¥oye 

In chapters which follow we will discuss methods of estimating 


the partial regression coefficients, i, and OC , rete’, 


Delyeoosd 
when the values of U and » are not known. Methods of choosing 
which variables are "best" to use in regression will be discussed. 


We shall also consider the problem of specifying relative "cost" 


of observation per unit reduction in C7. | ° 
PPpelygeeogd 
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Chapter III 


STATISTICAL ANALYSIS OF THE P VARIATE NORMAL 


In this chapter we assume that Y has a pevariate normal distri- 
bution with unknown mean vector U and V-C matrix d*, We are now 
concerned with methods by which an experimenter can estimate U and 
2, and subsequently, other parameters, such as regression coeffie 
cients for prediction equations for predicting Yp3 and ao.lseoes? 
the conditional variance of ¥ol Yo" when variables Yj seeee¥q are in 
regression. In order to distinquish estimates of parameters from 


their associated theoretical values, it is convenient to develop new 


notation to be used throughout this paper, listed here for easy 


reference: 
TABLE | 
Notation for 
Notation for Associated Estimated 
Theoretical Values Meaning of Parameter Parameters 
U pxl mean vector of the p Z 
variate normal 
y pxp V=C matrix of the p S 
variate normal 
B (veta) qxl vector of regression B 
coefficients associated with 
q vectors in regression 
CSp The element in row p, column Sop 


p of ye Which is the (uncon= 
ditional ) variance of Yp* Yp 
is arbitrarily chosen to be 
the variable to be predicted. 


The conditional variance of 
yelYi > where Y;" is a qxl 
vector of fixed values of 
Yes ese 
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A sample of size n can be arranged into nxp matrix form 
as follows: 


ma Mae RP ig 
Yo\? Yoo# ee# Vou 
Y ji ; 


Yat? Yn2 °** Yop 
where y.. represents the j th observation of variable y.-. Note 


that for this sample, observations of Yps the variable later to 


be predicted, are also required. 


n 
yy 
no. = 


n 


4 
Sample means are computed as: 


ey Cai tewey De 


Sample covariances = 


Sik = a 
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For | = k the sample a ae: become the sample variances: 
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By analogy to the mean vector U and V=C matrix, »y » We form 


the px! sample mean vector, Z, and sample V=-C matrix, S, as follows: 
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It can be verified easily that Z and S are unbiased estimates 
of U and y respectively; and that Z and |(n=1)/n] +s are maximum 
likelihood estimates of U and D. 

To develop estimates of the parameters of the conditional dise 
tribution of voly* we recall that the random variable YplYy* is 
normally distributed with mean and variance given by equations 2.3 
and 2.44. We partition Y, Y*, Z, S, as we did Y, Y*, U, andy, 


respectively in Chapter II: 


6a ee et, Y >i) Ne 
Y= oY) » 42 » and S = 
, S | 
Yp "p 2p 21 “pp 
where, as before: 
XQ 
* 
a nan ee 
vy; = fae a ~ (constant vector) 
é ox 
Ye! Yoel 


and, 
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Yoel Z| eal | pet pl p=! pi - 
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Since 2} and (n=!) @ Sij are maximum likelihood estimates of Us 


and Ci respectively, for i, j = |, ecoey Py it follows from the ine 


variant property of maximum likelihood estimates that: 


>. 


=| 
Wl, * = a 


| 
are maximum likelihood estimates of Uy a Dine Oa = u,), and 


= 
ae = Dis Dy De respectively. 
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(361!) ed tees eo! 
and, 
= -| n=] 
(3-2) “ppl aoe, pat" Sppe~at See mre 
It can be shown that ZDelyeesspel and Sopelyecespel are 


unbiased estimates of yt owes pal? and CDpelyecespel? the mean 


and variance of the conditional random variable ypl¥" respectively, 


by 


é 
4 
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p-| 


=| 
Similarly, if we let B = ~ (So, om) B is a maximum 


+ 
= 
likelihood estimator of J = One y i? 3 that is,b; is a M.L.E. of 


B; for i = Igeceospele It can be shown that B is also an unbiased 


estimator of i. 


We can write B in the form (since Si is positive definite): 


(3.3) B= oan S103 


mi Pye (teeeiaiip= | Sina 
b_ S ott Ss. 
p= | p-I | pel p-l pel pi! 


Equations 3.3 are called the normal equations. 


Substituting z 1 fOr UD ts .,pet» We obtain an un- 


Del,eoesp= Del 


biased estimate for the value of the prediction equation, 2.6, by: 
(3644) (Yo IY") =z 


Del,eoespHl 


: = | sfc 
+ Sp, Syy (Yy - 2) 
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Chapter IV 


AN EXAMPLE 


Assume that an experimenter wishes to gather data from some 
process involving five variables, which he assumes to be related 
according to a five variate normal distribution. Suppose this five 
vartate normal actually is defined (completely) by the following 


theoretical vector, U, and V-C Rearix, 22 4 


7 #14600 oF ; 
148.1500 Us 
aa) U=q 11.7700 P=4 uz 73 
30.0000 uy), 
95.4200 Ur 
(4.2) x se sf az; ly 


— 


if ~“gls60a5 20.9233 = 31.0517 = 2.1667 6),.6633 
1 20.9233 22.108 - 13.8783 = 253.4167 19140792 

> =) = 31.0517 = 13.8783 1.0258 3.1667 = 51.5192 ?. 
Yel = 24,1667 = 253.4167 321667 280.1667 = 206.8083 
64.6633 191.0792 = 51.5192 - 206.8083 © 226.3133 


Using developments of chapter I1, we let Y =/y 


I. The value of U and y used here were computed as sample vector Z 
and VeC matrix S using data from table 20.h, page 6,7 of HALD (H]. 
The results of tables 20.5 and 20.6 of Hald were used to verify the 
results of computer program MVSIM, which performed most of the 
computations required for this paper. 


1d, 





We know that we are going to be given values of Yi» Yoo Yao Ye 
from which we will predict Yo Hence, we must set up the prediction 
equation for Yoe (equation 2.6). Accordingly, we partition U and y 
as: 

7 #14600 
48.1500 
Uo= 11.7700 - U 


[95 .l:200] Up 


344.6025 20.9233 = 31.0517 = 2.1667 6),.6633 

20.9233 22.108 = 13.8783 - 253.4167 191.0792 

>, =} = 31.0517 = 13.8783 1.0258 3.1667 - 51.5192 
~ 24.1667 = 253.4167 3.1667 280. 1667 = 206.8083 


61.6633 191.0792 = 51.5192 - 206.8083} 226.3133} 


2. F e Ay 
Dot 20 ° 


ror 3, we get 
1.5513 ies 


, Pan J 05103 Bo 
“ Do i 1021 B, 


- 1364 (h? 


The prediction equation for Ys becomes: 


15 





h 


im 
(443) € yely*) =us- ) Buy, +) Byy;* 


7 «14600 
hB. 1500 
= 95.1200 = (1.5513 .5103 .1021 = . 1438) 
11.7700 
30.0000 
or 
y * 
+ (1.551% 25103 .1021 = o1h38jee @ 
y * 
3 
* J 
"h 
Or, 
(4.4) 


E(yly,") = 62,3881 + 1.5513 a + .5103 Ps + ,1021 Li - 1438 Ye. 


The variance of yoIV\*» Ces ie, Fae (the conditional variance 
@*sCts/?9 


of Ys given Vis Yoo Yas Yp9)» is: 


=| 
Camo 5, h, ~ d 90 . Lot an ae 


226.3133 = 222.3212 = 3.9891 


which is.a measure*of the prediction error whent- fo full he is used 
to predict Ys when ed is known. By comparison, i¢ the values of ae 
were ignored and if, instead, the value Us = E (yc) = 95,4200 was 
always used to estimate Yos the corresponding measure of the predice 


tion error would be Coc = 226.3136 
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Thus, by knowing the mean vector U, and the V=C matrix ie as 
given by formulas 4.1 and 4.2, we can set up the above prediction 
equation, 4.4. Then for any set of values yis Yo» Yzs Yj» We can 
make an accurate prediction of Ys without observing its value. 

The problem facing the experimenter is more complicated than 
the one discussed in the preceding paragraphs. This is because 
he does not know the values of mean vector, U and the VC matrix, 
ne All he knows is that (by assumption) Y,» Yos Yzs Vy? Ye are 
distributed according to some five variate normal distribution, and, 
therefore, are completely specified by some theoretical mean vector 
U and V=C matrix > whose actual values he will never know. 

Assume the experimenter draws a sample of size 500 from this 
five variate normal distribution (specified by equations 4.1 and 
lie2). He then computes all sample means, variances, and covarie- 


ances (J, = Zi5 Sivy respectively)” and forms the sample mean 


Soa. 
Vy 

vector, Z, and sample V-C matrix S as defined in chapter III. 

Suppose, as an example, he obtains the following results upon draw- 


ing a sample of size 500: 


zy 7 727764, 
25 ae Yo 18.7155 
| y 1.568 
Zs *3 = 3 = ! : i 
ay Y) \ 29.3039 


(z..| (| (96.1816 | 


— © =m § -—— = ' 
ee \ 
Ct Oe ae 
— | © em ee ce 6 eee ee 


— ————— TT cla gee a | ae se ce 
= 4 ae 4 oe oo 
ee EP ee ee 
ee ee 


——_— @ | —— > ft -—— 


ea > SS 6 12> © aa ee eee 9) = 


— = in ti———£ w— 11 | @ 04 i mee 


. >_> =z» —_ > oo ——==-» « *& Y 


~~ —- =< — <a | «& ——. eel” — 


— == @ ji eS a>. a «¢ => 


Ags, ~>rayy - << «© «§ oie 


= | 





ae P51) Sjo 
Ss end | | 


Py Soo 


314.6160 20.195 - 32,8248 = 21.5656 64.0139 


20.1495 217.9056 - 18.3443 - 223.1718 171.9229 
= ||- 32.8048 - 18,3443 2.8371 7 h936 | J- 57.4059 . 
= 21.5656 = 223.1718 7 4936 22.3209) t- 180.4153 


| 64.0139 171.9229 - 57.059 - 180 614153 | 211.0392 


To oimaiala by B we compute 

| «6360 
T Nel 
ad 


_ 20590 


Hence, the estimate of the prediction equation 3.) becomes: 


(ygly, *) = Zp ; b, z, + y b. y.- 


- 564.592 | + 1.6360 a + 0592 | Yo" + o 17 Fey Ys, _ 00590 yy, ° 


The unbiased estimate of the conditional variance of yelY,* 


would be: 





oo a ae _ 
°55°1,2,3 5h =| “55 : en a l2 n=p-!| 


99 
= 11.0368 x 195 7 4.069) 


Now the experimenter is in a position to predict the value of Ys 
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given a set of values y)*, Yo"s Yn" Ye For example, suppose he 
is given that the ye =u., The frue means of the y;-« (Of course, 
he doesn't know that these are true means). 


Using the true prediction equation, we get (see 4.3): 
4 in 
* 
E (yelY, =U) = U5 < d. Buu. wu d. Buu. ~ 95 Id. 


The experimenter would estimate this value as: 


E (ygl¥,* = U) = 5h.5921 + (1.6360) + ( 7.4600) 
+( .5921) * (48.1500) 
S(ueeeiges) * (117 70o) 
+ ( = .0590) * (30.0000) = 95.623 . 
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Chapter V 


REDUCTION IN THE NUMBER OF VARIABLES IN REGRESSION ~ INTRODUCTION 


Experience has shown that when the number of variables, p, is 


large, say over 20, usually a relatively small number of variables. 


can be found to use in regression to predict Yo nearly as accurately 


as when all p-l variables are used in regression, [9]. page 20. 


Finding such a small combination of variables is desirable fora 


number of reasons: 


I) 


2) 


3) 


4) 


The prediction equation, has fewer terms; thus it is v7 
easier to compute a predicted value of Ype 

Fewer variables need to be observed in order to make 

a prediction of Ype Presumably this would result in 
reducing the cost of observing variables for each pre- 
diction of Ype 

When p is large, the prediction equation involving p-|l 
variables requires many computations. Step-wise pro- 
cedures, described later, when yielding a relatively 
small number of variables in regression produce a pre- 
diction equation with much less effort. 

When the regression is being performed on a sample, 
variables that do not contribute much variance reduce 
tion of Yp can actually cause the prediction equation 
to yield a worse fit to the underlying (specified) p 
variate normal than would result if they were omitted 


from regression. The reason is that the longer equation 
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can overfit the sample and ascribe some of the variation 
due to small scale random fluctuations to one of the pre 
dictors "by accident”. 


As one would suspect, whenever a single variable, Yeo is added 


to regression the old conditional variance, say, Cop ps 

eltlgeccao yg’ 
always greater than or equal to the new conditional variance, 
Cc However, usually the amount by which C- 


PPpel secesdyk® Ppolycoosq 
is reduced becomes small as the number of variables in regression 


increases, even though optimal combinations for each number of 
variables in regression are used. To illustrate this idea, let us 
consider an example of a regression problem under ideal conditions. 
That is, we shall examine a p variate normal specified in terms of 
vector U and matrix ye e 

We first compute the prediction equation for Y ps and the 


associated conditional variance C- 


PDel seco? for each possible 


De | 


combination of variables Ypseres¥na | in regression () - 
ie 


sets of prediction equations to solve). We shall then group the 


results according to number of variables in regression, and from 
each group pick the "optimal" combination of variables in regres- 


in 


Sion; that is, the combination of variables, say, YyseeesVqs 


regression producing the smallest Co pelyeceea® 


il. Clarification of notations: The reader should understand that 


whenever a "combination of variables in regression, say, YyrccerVq 


and the associated conditional variance, " is discussed 


CEpelyoceed? 
as in the preceding paragraph, the q variables in regression are not 
necessarily meant to be regarded as the first q variables as defined 
by position in the original vector U and matrix d: In other words, 
in order to ease notational difficulty, variables in regression are 


temporarily relabeled YjseresVoe % 
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With this grouping, we can now start with one variable in 
regression and add to the number of variables in regression one at 
a time, each time choosing the “optimal” combination of variables 
for that group, until we decide that adding more variables to regres- 


sion will not reduce C7 enough to make it worthwhile, 


ODol,cocesd 
In our example, we shall use the five variate normal as defined 


by lel and 4.2. To compute the prediction equation using only Y} 


in regression, the prediction equation becomes: 


‘ (yelY*) ee B Hit B, “i 
is pe Aaj 
where B, = B-[Z., ry] = 611.6633 lus = |,8687, 


and Cu 


55-1 Loe do ri, die 


644.6633 x 64.6633 


s 10 ° ° 
Tatil 5 1739 


= 226.3133 - 
Similarly, we compute partial regression coefficients, £,, 


for all 15 possible combinations of the variables Y\» Yos Y39 Yh 


in regression, Table 11 shows the results, 


ce 








Table II 


Associated 
Variables Partial Regression Coefficients Condi tional 
in Variance 


Regression q B, jhe B; B,, OF. eee 








y 1.8687 105.1739 
ie | 789 | 75 05280 
Yz | - 1.2557 161.6167 
yy, | - .7381 = 7366553 
*Y¥, Yo 2 1,683 6622 1.8261 © 
yy ve ee So oe 2 95 102,255! 
~ h\ 2 1.4399 ~, 6139. 6.23038 
Yo Yz 2 07313 = = 1.0083 34.6208 
io WR, | 2 23108 - 4569 72.065 
YAY; 2 - 1.1998 = .72h5 16447 
Y) Yo Vs 3 1.6959 6569 22500 1.0096 
CY ¥o Ny 3 ebb eLI60 ~ 42365  3,9982— 
o ¥5(¥,) 3 140518 - .4l00 - .6427 4.2368 
Ya-¥s ¥), 3 - 9234 = 'o4h79 = 1.5570 6.1506 
Talon! Vy ho 465513 05103 102! = .1h38 3.9891 


Notes Each group is identified by the value of q. 


* Indicates the “optimal” combination of variables for the group 
(for that number of variables in regression). 
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We now select the optimal combination from each group of variables 


in regression as follows (we omit the partial regression coefficients) : 


Table Il 

Variables in Regression Associated 

gq Condi tional 
Group Number (Optimal Combination) Variances of Ye 

0 None Crs = 226.3133 
2 Ys Yo Ci5.1,0 = h,8261 
D Yi Yos Yh Ce5-1,2,) - 529982 
. a 5541,2,3,4 ~ 3-989! 
From table II! we immediately see that most of the reduction of 


the conditional variance of Yo, can be done by introducing only two 
of the possible four variables into regression, namely Y, and yoo 
Very little more is accomplished by using the other two variables, 
given that | and Y> are going to be used in regression, 


Note that the five variate normal is easily handled by an elece- 


mm 


tronic computer because only i ( 
j=! 


My 


j 15 prediction equations had 


to be computed, none with more than five variables involved. On 


the other hand, the 18 variate normal, for example, requires 
17 
y ('7) = 141,07! prediction equations, most of which involve many 
= J 
j=! 


variables. Hence this procedure is not always feasible even when 
today's high speed electronic computers are available. 


It is interesting to note that when all four variables are in 
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regression the values of the regression coefficients do not suggest 
which variables might be best to eliminate from regression, In fact, 
none of the values are close enough to zero to indicate that any 
should be removed. 
In this chapter it has been shown that we can expect the amount 
of reduction in the conditional variance of Yp to be less per variable 
added to regression when the number of variables in the optimal com= 
bination becomes larger. Thus, if one were willing to state in advance 
his maximum allowable value of the conditional variance of Yps the 
problem would be a straight forward one of searching table I! for the 
minimum number of variables producing that conditional variance or 
less. We now restate this same problem in the above terms: 
"To find some satisfactorily small number of variables, 
q, (q = p-l), that, when used to predict i reduces, Cio. 


gooesd 


to some satisfactorily small fraction of the unconditional variance 


t? 
of Yos Cops 
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Chapter VI 


THE STEP=WISE PROCEDURE 


We now discuss an alternate procedure of searching for optimal 
combinations of variables in regression, called the step-wise proce- 


dure. This procedure has the advantage of reducing the number of ) 
p=! : 
prediction equations to be solved from ‘ ge DT as in chapter V, 
r=| 


to p=! or less, thus keeping the number of computations to within 

the capability of today's high speed electronic computers, We shal! 
see that the combination of variables selected by this method is not 
always optimal, ieee, it is possible that a different set of the same 
number of variables might yield a more accurate prediction equation 
for ng However, practical experience indicates that sets decidedly 
better than those discovered by the procedure outlined in this chapter 
are rare [A]. page 19. We shall:discuss additional problems en- 
countered when the step-wise procedure is applied to a sample. The 
need for statistical tests at each step is demonstrated and an actual 
test is developed. 

The step-wise procedure is as follows: At each step every vari- 
able not yet in regression is examined to see how much the conditional 
variance of %p would be decreased if it, alone, were added to the 
variables already in regression, ieee, assuming q variables are already 


in regression, the quantity C- is computed 


DPpel,ceesd > om 1, om 
for each variable, Yn? still not in regression, The variable to be 


added to regression is y,, the variable for which this computation is 
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greatest; i.e., Yb. is chosen from the variables not in regression, Yn? 


so that 


Cppel,ovesd . CBpelyecesdok os = (Grn ena . Cop.tseeesqum| 


or equivalently, such that: 


= min 
Oe APP os rr Lape, 1, een. 


We illustrate this procedure by applying it to the p variate 
normal specified by equations 4,1 and 4.2, This illustration can 
be followed most easily if reference is made to table I! of chap- 
ter Vs 

Step I: Compute all four conditional! variances of group |, 

and choose the smallest value (73.6553). 
action: add variable yj}, to regression 


results: variables in regression: Yh, 


Cn.) 7 73-6553 


Step Il: Compute the conditional variances of group 2 that 
include variable ye regression, and choose the 
smallest value (6.2303). 
action: add variable y, to regression 
results: variables in regression: aren 


Fisch * 6290 


Step !I1: Compute the conditional variances of group 3 that 
include variables yy and Y), in regression, and 


choose the smallest value (3.9982). 
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action: add variable Yo to regression 


results: variables in regression: Yi2 Yoo Hi 


55-142, ~ 309982 


Step IVs Add the last variable 
results: variables in regression: Ys Yoo Yao Vy 


(55+1,2,3,4 = 3-989! 


As in the preceding chapter, we immediately see that most of 
the conditional variances of 1s can be eliminated by using only two 
of the possible four variables in regression, However, this time 
the pair chosen were variables y, and Y), instead of y, and y53 pro= 
ducing a conditional variance of 6.2303 instead of 4.8261. 

The step-wise procedure is equally applicable to analysis of 
a sample of size n. In this case all information is obtained from 
the sample vector, Z, and sample V-C matrix, S. In particular, 


the values of the sample conditional variances, rather 


“ppelyecesd? 


are used at each step to determine the next varie 


than 
Copel sees 


able to enter regression. As before, p=! prediction equations, 


and associated estimated conditional variances of y_, $s ; 
Pp” “ Pppelseoosd 


can be obtained. Each succeeding equation will contain one more 


variable in regression, and usually will have a smaller value of 


| 
Sop q° Now, as in chapter VI, the most acceptable combination 
elgeo0esy 


of variables in regression, for which the estimated conditional varie 


ance of Yp is small enough, can be chosen. 


|. The exception can occur when the sample size, n, is small. See 
Hald's example table 20.6, where n = 13. (H]. 
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At this point we must consider a problem that is ever present 
whenever a sample its used as a source of information. In the present 
case the problem is stated as follows: How do we know that the sample 
size, n, was large enough, so that the conditional variance associated 
with the combination we nace just selected is accurate? (We will 
always assume that n is greater than p). 

Intuitively, if n is just a little larger than p we should not 
have much confidence in sample vector Z and sample V=C matrix S, 
nor in the estimated regression coefficients or conditional variances 
OF Ype In fact we shouldn't be surprised if a second sample of the 
same size were to produce a comnletely different set of variables when 
the same step-wise procedures are used. On the other hand, as n 
approaches infinity the samples Z and S approach the true values of U 
and Dee It ts clear that at each step, each variable that is a candi- 
date to enter regression should be given a statistical test of some 
kind. 

Suppose q variables, YyseeesVqs are already in regression with 


estimated conditional variance of Yp given by s 3 and suppose 


Ppel,oeosd 
that we are considering variable Y,, for addition to regression, It 


can be shown that if actually oT = nal fieoe nee 
statistic 
(6.1) a (n-q=1) =p Die lije% atid ni (n-q=2) Sor aye er 


aon pyre os 


has the F distribution with | and neq-I degress of freedom [4]. 


section 6.4, Furthermore, statistic F will tend to be greater than 
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Ftii, n=@e|) if 
( , ; ) Cope lyeee sek 


We immediately encounter a new complication: The above statistic 


is actually less than aan. a 
F behaves as stated above so long as variable y, 's studied by itself. 
However, the selection of y, from among those variables still not in 
regression was not completely at random, y,, Was chosen at this time 
because it was estimated to be the "best" variable to add at this 
step. In other words, we are in effect computing F for a number of 
variables and choosing the variable for which F is the largest. It 
is important to realize that due to this method of selection, the F 
statistic used with the selected variable y, Will tend to be larger 
than would be expected on the average if variable y, were to be studied 
as an individual variable alone. Intuitively, this effect should be 
stronger with the first variables added to regression, since those 
variables for which F is large due to randomness, are removed from 
those not in regression early. Suggested procedures for compensating 
for this are discussed in a later chapter. 
Let CY be the probability of erroneously concluding that 
is less than Cu 


Copelyecesdsk? Ppelyeoosd 
equal (CYis usually chosen to be .05). This error is usually called 


whenever actually they are 


the type one error. Suppose now, at each step we compute the statis= 
tic F of formula 6.1, and compare with the value of Foxy Wises) 

which can be found in tables of the F distribution, If the sample size 
is too small the power of the test will be low. This means that the 
actual difference between C- and @~- 


PPel,coeosd PPpeol,ceosdyk 
stantial and still, the probability that the computed statistic, F,.. 


» can be sube 


will exceed FOX can be small. (Of course, this probability 


|,n-q-1) 
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will always be greater than ({). This error is usually called the 
type two error, 


On the other hand, given that Cy is actually greater 


Dol sooesd 

than Capel pees yok (the only alternative being that they are equal) 

regardless of how small the actual difference, the probability thaf 
rs 

statistic F exceeds FOy(1n-q-1) can be made as close to one as we 

please by increasing sample size, n, indefinitely. 


is 


Meanwhile, among those variables for which Copel pecerd | 


actually equal to Copel sessed. 


are expected to "pass™ the F test (i.e., F »F inde= 
p p ( 2 CiegereDe 


k» approximately CX x 100 percent 


pendently of the sample size, ne 

We have just seen that the two important factors that affect the 
probability that variable y, will pass a particular F test are the 
amount by which the actual values of Copel gees yd? and Capelsecerek 
differ, and the size of the sample, no Thus, the decision rule we 
might use is to terminate the step-wise procedure at any step that 
all variables still not in regression fail to pass the F test. 
With this decision rule, the F test will limit the variables in ree 
gression to those whose contribution to reduction in conditional 
variance of Yp appear to be large enough for the given sample to 
measure. 

In the next chapter we shall consider additional halt criteria 


which an experimenter may wish to impose on the step-wise process, 
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Chapter VII 


AUTOMATIC REGRESSION ANALYSIS - CRITERIA 
FOR HALTING STEP-WISE REGRESSION 


In this chapter we shall develop useful procedures for con= 
ducting automatic regression analysis on a sample of size n of ap 
variate normal using a high speed electronic computer. Efroymson 
[3] has developed an algorithm very suitable for computer use in 
which any single variable can be added to, or eliminated from re= 
gression (depending upon its former status). At any step the re- 
gression coefficients, conditional variance of Yoo multiple correlae 
tion coefficient of Yp on the variables in regression, and many 
other desirable parameters can be computed easily and printed out. 
Useful criteria for halting the’ regression process are discussed 
and developed. 

Given a sample of size n Of a p variate normal, formulas for 
computing vector Z and matrix S have already been described. Also, 
basically we shall use the step-wise procedure of adding variables 
to regression, The most important remaining problem is to consider 
how the user of an automatic regression analysis computer program 
can specify in advance of the computer run, reasonable criteria for 
halting the step-wise procedure. 

So far, it appears that a satisfactory criteria for stopping 
the regression process has never been fully developed to suit autos 
matic step-wise regression. Miller (7| proposes adding variables 
until the F test fails. He also proposes a method of adjusting the 


level for which the critical F is chosen (1 = C{ in chapter VII) to 
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compensate for the fact that the method of choosing each variable 
to enter regression is not a random choice: 


In order to derive a test for the statistical significance 
Of X;, the following analysis may be performed: When a 
predictor is chosen at random from a group of predictors, 
an F test is performed where the critical F is usually 
taken at the 95% level. This allows for a one in twenty 
chance for considering this predictor significant when in 
fact it is not. In the screening procedure the selection 
of X; is not a random choice. Therefore, it is necessary 
to détermine at what probability level the critical F 
should be taken while still specifying a one in twenty 
chance occurrence. 

For the screening procedure it appears proper to make the 
level for which the critical F is chosen a function of the 
number of possible predictors, ne The ordinary 95% level 
F can be expressed as 


F 95 = For -afeoy 


and for the screening procedure the 95% level ts 


F 195 > (= Weobh)* 


Intuitively, Miller's solution seems to be somewhat extreme, 
For example, if p = 51 (and ( = .05) then at the first step the 


/ 
level chosen for the critical F, (¥ , would be computed as follows: 


OX ae wa a : 
| = D [| = 20x50 = 998; CX 0998 SCO 


so that the value used for comparison would be F 49, (1,49) = 12.2, 
rather than es (1,49) = 4.03 when no adjustment is made. In this 
case the critical F value is arbitrarily tripled only because there 
are 50 variables still not in regression. Granted that the critical 


F should be adjusted upward in order to maintain a "one in twenty 
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chance Occurrence", it would seem that due to lack of information 
as to the extent of this non-random effect, one should make such an 
adjustment more conservatively than thise 

Perhaps a satisfactory "hedge" might be to use the adjusted 
levels 


/ 
~ To ,or O&= 
gp log p 








K, 


where K is a constant inserted by the program user for his partie 
cular sample. 

Conceivably, one might wish to make no adjustment at all for 
this effect because the consequences of increasing the type two 
error during the early steps are so detrimental to the step-wise 
procedure. 

Ef roymson [3] proposes two F tests at each step. His program 


first compares each variable, Yes currently in regression with an 


appropriate "min F" critical value to see if it still passes the 
F test of significance. If such a variable is discovered, the 
action at that step is to remove the variable from regression, By 
setting min F to a value slightly less than the standard critical 
value used for adding variables, the possibility of creating an 
endless loop is avoided. 

This feature is appealing because new combinations in regression 
obtained in this manner are always more nearly optimal (as far as 
the sample is concered) than was the preceding combination of the 
same size; yet the number of computer instructions required to do 


this operation is minimal. 
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In chapter VI it was shown that the choice of FOX(1 nega!) as 
the value of critical F is made in an attempt to limit the variables 
in regression to those whose contribution to reduction in conditional 
variance of Yp is large enough for the given sample to measure. It 
is clear that Efroymson's double F test contributes to this effort 
by insuring that all variables in regression continue to pass the 
F test even after subsequent variables have been added. 

It is impossible to anticipate here all uses for different 
combinations of specified stopping criteria. We already have seen 
that statisticians so far have only provided general guide lines 
in this area. This is mainly because each individual p variate 
normal distribution has its own set of complications, and for each 
computer run on a given sample the experimenter may have varying 
amounts of prior information regarding the p variate normal he is 
studying. Thus, for any automatic regression analysis computer 
program it is important that the user of the program be able to 
specify halt criteria with as much flexibility as possible. 

Perhaps the most important aspect of each ha!t criteria is 
that it must be specifiable in a manner most meaningful to the exe 
perimenter. For example, some experimenters under certain conditions 
may not look upon the F test of chapter VI as being useful to him at 
all. Quite likely, he may wish to replace FOX(1,n<q-1) with a value, 
say J, to be the critical amount of reduction of the variance of Yp 
as a stopping rules; or, he may want to specify both critical values. 
Although nN and FOY(1,n-q-!) are in different units, it is clear that 


the r test is equivalent to an F test, so that in specifying both 


39 


7 


ee 





et a i 
eee 
ee eo 
— ea) ee )e st aa 6 ee 
A HF FF ee ee 
Ne ee 
PP i A” me Osman ome eee tele 
1 get fr ee ee 
a a een Ir 
ee ee ee eee et ee) es) |) eee 
ee pe ee Ee tte ee 
ee 
oe! © (eee | eee @ a: sie 8 thee | te 
ee ee i oe © ee 
Oe ee eee ee ee 
8 068 he GO 6 ee ee ee in 
—_—- a tem ee le ee _- 
ME mh ohie—:+ mim +e = & «+ 
— ° Be OP 2 | antl Sa! em 
=~ « nag! Fess. 1—. 4° = § 











tests the experimenter is merely having the computer apply whichever 
test is the most stringent at each step. 

The following example illustrates most of the points covered 
in the last few paragraphs. We show here how a suitable choice of 
min F and critical F, artificially chosen, can aid Efroymson's 
double F test procedure to find a more nearly optimal combination 
of variables in regression than already obtained by the step-wise 
procedure at a previous step. To do this we take an example worked 
out by Hald [6]. section 20.4. In this example, Hald used data 
from a sample of size 14 of a five variate distribution which we 
will assume here to be normal, The sample vector, Z, and sample 
VeC matrix, S, are the same as those shown by equations l.! and 
4.2 in this paper, which in chapter IV were used to define U and 
Dy) respectivelye In the following illustration we shall consider 
Let and 4.2 to be computed Z and S as in Hald's example. 

From Hald's example we compute the F statistic(6.1] for each 
variable in regression and not in regression at each step. See 
table IV below. For variables in regression, Yes the value com- 
puted is the F statistic that would be computed for y, if it, alone, 
were removed from regression first. These values pertaining to 
variables currently in regression are underscored in table IV. In 
order to illustrate the above points, the F test using FOY(1,n-q-1) 
was eliminated. 

We now choose the artifical values of critical F and min F to 
be 4,5 and: 3.0 respectively. With this choice we shall obtain the 


Optimal combination of variables y; and yo, where the regular 
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forward step-wise procedure yielded variables y, and Yh in chape 


ter IV. 
Table IV 
Vartables 
in Regr. Computed F Statistic|6.1] 
Before 
Step This Step yy Yo Ys, Y), 
| 6) 12,60 21.96 40 ( 22,80) 
2 Yh (108. 16) o!7 10.30 22,80 J 
3 Yio Yh 108.16 ( 5.03) heh 159.21 
mn een 15,02 5.03 2Ot 1.86 
5 Yi» Yo the optimal combination for 


two variables in regression 


The variables added to or eliminated from regression were chosen 
according to Efroymson's double F test procedure. Recall that no 
variable was to be added at any given step if the F value of one of 
the variables already in regression got below 3.0 (min F). Hence 
at step 4, variable y}, was eliminated yielding the optimal combinae 
tion y)» Yoe At each previous step, the variable added (whose vatue 
is enclosed in parentheses) was chosen because its F statistic was 
the largest among those stil! not in regression and was also greater 
than the critical F value which was artifically chosen to be 4,5, 

This example illustrates some complexities that arise during 
the regression process that are still not completely explainable 
analytically. For instance, the relative values of statistic F 
changed drastically as the combination of variables in regression 


changed. These values correspond to relative amounts of reduction 


pI 


of conditional variance that would be due to the corresponding 
variable if it were (or is) in regression, Thus, when y}, Was added 
to regression at the first step the relative contribution in variance 
reduction due to yy jumped from 13 to 108, implying that yy and HH), 
are much more powerful together than their sum when each is used 
alone. 

This example also suggests reasons why an experimenter may wish 
to specify critical F values artificially, especially if results of 
prior computer runs are available, 

lt was suggested earlier that instead of keeping track of come 
puted values of -{601] requiring specification of artificial critical 
F's on the part of the experimenter, it might be simpler for him to 
keep track of actual amounts of variance reduction of Yp and make up 
artifical values of r in units of variance reduction of Ype Also 
it is clear that the experimenter may wish to specify a value, say, 
minA = Xr » Which would become the critical amount of variance ree 
duction required of each variable in regression, in order to stay in 
regression. 

The following summary lists a few useful halt criteria which the 
experimenter may wish to specify before the automatic regression 
analysis is performed on a given sample. The automatic regression 
program should permit the experimenter to specify any combination 
of these criteria for any given computer run: 

if 16 Urges and min F (Chapter V1) 

Co Xr and rawr (defined above) 


3. Stop when the conditional variance of Yp gets as low as 
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V percent of the original variance, Sope 

11. Stop when the conditional variance of Ms gets as low as T. 

5. Stop when W variables have been added to regression. 

In chapter IX we shall propose a procedure using some of the 
above halt criteria in searching for an optimal combination of vari-« 
ables in regression. 

In chapter V it was stated that one good reason for reducing 
the number of variables in regression might be to reduce the cost 
of observing the variables from which each future prediction of Yp 
is to be computed. Often some of the variables cost considerably 
more to observe than others, and the experimenter may not be so 
interested in reducing the total number of variables to observe, 
as he is in reducing the total cost to observe the values of the 
variables in regression for each prediction of Yp to be made later. 
Thus, it is desirable that the experimenter be able to specify ob- 
servation costs, c;:, (say, in dollars) for each "independent" vari- 
able Yyrcees¥oiys and have the automatic regression analysis operation 
reflect these costs when selecting variables to go into regression. 

The "cost option” should differ from the regular option only in 
the criteria used at each step to determine which variable is to be 
added to regression, Recall that the regular option calls for chosing 
the variable that will reduce the variance of Yp the most, to be the 
variable added. 

In the cost option, at each step, those variables still not in 
regression are determined. Then, instead of Yy, for which opel, eeeed,k 


is least, (as estimated by s is chosen for which 


Soiipeecni es Yj 
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= /(C, CapebyeessQe J? is leasfs to@o, Yj is chosen on 


the basis that it is cheapest in terms of "dollars" to observe per 


Die | pale 259 


unit of variance reduction of Yo? due to adding Y je It is clear 
that the standard option is just a.special case of the cost option 
in which the observation costs are all specified to be equal. 

Since optimality is now measured in terms of minimum cost to 
observe per unit of variance reduction instead of maximum variance 
reduction, the program user must be able to specify a halt criterion 
so that whenever the cost to observe a variable in regression beu 
comes greater than, say, max C, the program will remove it, and 
Whenever all! variables still not in regression wou!d cost more than, 
say, C dollars per unit of variance reduction, if added, the program 
should half regression, Now, mind and A are not needed as halte 
ing criteria for the cost option. However, the experimenter should 
still have the option of including other halt criteria summarized 
above. 

To summarize, neither Miller’s nor Efroymson's stopping rules 
are optimal. Both basically use only the statistical F test of 
chapter VI as a decision rule for halting. It has been i! lustrated 
here that additional decision criteria that can be specified by the 
experimenter in terms more meaningful to him, may greatly facilitate 


his search for optimal combinations of variables in regression. 
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Chapter VI11 


THE MV REGRESSION AND MV SIM COMPUTER PROGRAMS 


The purpose of this chapter is to describe a computer program, 
called MV REGRESSION, which performs automatic regression analysis 
On a sample of size n. Also briefly outlined is program MV SIM 
which generates samples of size n from a specified p variate normal. 
The detailed operation of MV SIM is described in appendix A. Both 
programs are written in NELIAC compiler language. Operation of 
these programs on the Control Data Corporation model 160) computer 
at the U. S. Naval Postgraduate School has produced all of the come 
putations involved in the examples throughout this paper as well as 
the test results discussed in chapter |X and appendix B. 

Briefly, MV SIM will analyze a specified p variate normal (given 
by U and ») and print out true regression coefficients and associated 
G1... for any set(s) of q variables specified by the program 
user (q = p-!). Next, MV SIM will generate a sample of size n from 
the specified p variate normal and compute sample vector Z and sample 
VeC matrix, S. Before turning control! to program MV REGRESSION, MV 
SIM performs statistical tests on Z and S, and prints out results of 
these tests, but takes no action based on Hiese results. These sta 
tistical tests and actual computer run results are discussed in dee 
tail in appendix 8. 

Before proceeding with a description of MV REGRESSION, it is 
interesting to consider the powerful research tool one has when he 


can specify a p variate normal (U and ys) and quickly generate 
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random samples from that distribution, It is obvious that this 
Operation saves much time in gathering data, or in “making up” 
reasonable samples when it is desired to test the operation of a 
regression analysis program such as MV REGRESSION. | (This was the 
case when computations for example in this paper were required). 
But MV SIM offers the statistician a much more useful research 
capability than this. Using MV SIM one can make accurate compari- 
sons of the results of any regression scheme with true regression 
equations, conditional variances, etc.e, which MV SIM computes from 
the specified U and dew Of course, for such a comparison, the re= 
gression scheme must be applied to a sample drawn by MV SIM from 
the distribution specified by U and ue 

The sampling capability of MV SIM also makes it possible to 
perform empirical sampling studies of random variables whose dise= 
tributions are difficult to find theoretically. One such study, 
now in progress, is discussed in chapter |X. 

We finish this chapter with a detailed description of program 
MV REGRESSION, 

The inputs of MV REGRESSION are as fol lows: 

|, Start with a sample of n observations of the p variate 

normal. !f MV SIM supplies the sample, it will! supply 
it in the form of Z and S. 


"cost" option (see chapter VI!). 


2. Specify "standard" or 
lf cost option, give cost of observation, Ce» for 
variables Yes for i = lyeeesPp-le If the user specifies 


"standard", he still may specify costs and obtain 
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printed cost data even though the "regular" criteria is 

used as far as entering variables into regression is 

concemede 

Ze Specify criteria for halting regression of a sample: 

A. |) FOU(1.n)? the value to be compared with statistic 
F for adding variables to regression. 

2) Min F, a value less than FOX(15n) to be compared 
with statistic F for removing variables from 
regression. 

B. 1) Last variable added reduced the conditional 
variance of Yp by less than X (not used for 
cost option). 

2) Last variable added, ys costs more than C 
dollars to observe per unit of variance reduction 
of y, due to adding y, (used only for cost option). 

C. Conditional variance of Yp became less than T. 

D. Number of variables in regression reached W, 

Before step | of the regression operation, MV REGRESSION prints 
out Sop? and (optionally) the RR matrix. (The RR matrix is a pxp 
matrix which contains all current data in compact form from which 
all required parameters at each step can be computed. Initially, 
it is a matrix of sample correlation coefficients which is easily 
computed from sample VeC matrix S. See Efroymson (e]). 

At each step, after a variable has been added to regression, 
the following data is printed: 


| a. "Best" variable to have been added (variable with 
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Vil 


minimum Copel peceed’® 

b. "Cheapest" variable to have been added. 

Ce Whichever of the two variables above that actually 
was added (a. if regular option, be if cost option) 

The value used in the F test for the added (or removed) 

variable. MV REGRESSION compared this value with the 

input value of FO(1,n) (or Min F). 

@e The square of the estimated new multiple correlation 
coefficient of Yp on the variables in regression, 

be The estimate of the new conditional variance of 
7p? ops Wace no At 

The cost to observe the variable just added, y,, per unit 

of conditional variance reduction due to the addition of 

this variable to regression at this time. This is com= 

puted as c, ve (5 po. 1, ewe d = = Pe 

a. A list of the new set of variables, yz, (i = 1,000q) 
in regression. 

b. The estimated regression coefficients, Dee 

The cost to observe the new set of variables in regression 


per unit of total variance reduction of Ype This is come 
puted ass 

q 

d ci JAS pp = § ppstsevoeg) 


The new RR matrix (optional) 


As indicated earlier, it is possible to specify cost of obsere 
vations, Cs, even though the standard option is used, In this case, 
items 1V and VI are still computed and printed, but of course, the 
"best" variable to add (item la) is still the one actually added. 

Each step, at which a variable is being removed from regression, 
item | above becomes "the variable just removed", and items I1, II1, 
V, VI, and VII only are printed. 

Minor changes to the program can be made to cause it to print 
out other data after each step, such as estimated variances of the 
estimated regression coefficients. 

The next few pages show the actual program output of a regres= 
sion analysis performed by MV REGRESSION on a sample of size 300 of 


a five variate normal. This sample was generated by the MV SIM prow 


gram using input vector U and V-C matrix » given by ].1 and .2. 
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MULTIVARIATE ANALYSIS (CONTINUED) 


MAXIMUM REDUCTION OF THE CCNDITIONAL VARIANCE OF Y 5 


HOWEVER 


THE FOLLOWING COSTS OF OBSERVATION ARE SPECIFIED 


Y2 Y3 Y4 Y5 
16.0000 20.0000 -0000 


12.c000 


Y] 
10.0000 


ANY ONE OF THE FOLLOWING CONDITIONS CAN HALT REGRESSION STEPS 
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MULTIVARIATE ANALYSIS (CONTINUED) 2 19+ 1963 PAGE 8 


SAMPLE NUMBER 1 
SAMPLE OF SIZE 3C0O OF THE 5 VARIATE NORMAL . 


SAMPLE MEANS 


Y] Y2 Y3 Yu Y5 
T4125 48.2586 11.9077 2928140 95.4458 
SAMPLE VARIANCE CCVARIANCE MATRIX 


31.4587 1458327 = 28/8108 = 17.8985 55/3288 
14.6327 227.5360 - 309154 — 2435 .-4N49 171.7598 
- 28.5108 - 3.9154 39.1032 - Te640K - 39.8334 
- 17.8985 -— 243.44u9 - T6404 276.5990 - 191-4721 


55.3288 171.7598 -— 39.8334 - 191.4721 199.0421 


MULTIVARIATE ANALYSIS (CONTINUED) 2 19 1963 PAGE 10 
ANALYSIS OF SAMPLE NUMBER 1 


SAMPLE VARIANCE OF Y 5 = 199.0421 
F LEVEL TO ENTER = 3.87 F LEVEL TO REMOVE = 3.7 
RR MATRIX TO START 
1 d000 Y5726 - YB 116 - Y1915 Y2981 
21726 1.C000 - 0415 - -9703 ~807C 
- 28116 - oCK15 1.0000 - 0734 - 04515 
- 21915 - 9703 - 20734 1.0CCO - ~8160 
-6981 8070 - 04515 - -8160 1.0000 
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MULTIVARIATE ANALYSIS (CONTINUED) 2 19+ 1963 
STEP 1 

BEST VARIABLE TO ADD WAS Y 4 

CHEAPEST VARIABLE TC ADD WAS Y 2 


VARIABLE ADDED WAS Y & 


298) = 
© 3340 


STATISTIC USED TO COMPARE WITH F(l, 


NEW MULTIPLE CORR COEFF SQUARED = 
NEW CONDITIONAL VARIANCE = 66.7210 


COST TO OBSERVE Y 4& IN DOLLARS PER UNIT VARIANCE REOUCTICN 


NEW SET OF VARIABLES IN REGRESSION 


u 
COEFFICIENTS B(1) BO = 116.0842 
- .6922 . C000 .0000 .0000 
COST TO OBSERVE THIS SET OF VARIABLES PER UNIT 
OF VARIANCE RECUCTION OF Y 5 
13223210 UNITS CF VARIANCE REDUCTION = 
THE NEW RR MATRIX 
Y1 Y2 ¥3 Y4 
29632 - .C132 - .8256 21915 
- ro 32 .0582 - 21128 .97C3 
- ~8256 - .1128 .99K6 0734 - 
- 21915 - -9703 - 20734 1.0C00 - 
25417 20152 - 05114 ~8160 
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595.9689 


-0000 


1511 
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o54ul7 
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MULTIVARIATE ANALYSIS (CONTINUED) 2 19 1963 PAGE 12 
STEP 2 

BEST VARIABLE TC ADO WAS Y 1 

CHEAPEST VARIABLE TO ADD WAS Y 1 

VARIABLE AODEO WAS Y 1 


STATISTIC USEO TO COMPARE WITH F(l, 297) = 3089.66h0 


NEW MULTIPLE CORR COEFF SQUARED = 0293 
NEW CONDITIONAL VARIANCE = 5.8889 
COST TO OBSERVE Y 1 IN DOLLARS PER UNIT VARIANCE REDUCTION = 21643 
NEW SET OF VARIABLES IN REGRESSION 

1 \ 
COEFFICIENTS B( 1) BO = 102.8895 

1.4124 - 6008 0000 0000 0000 
COST TO OBSERVE THIS SET OF VARIABLES PER UNIT 

OF VARIANCE REDUCTION OF Y 5 

19321531 UNITS CF VARIANCE REDUCTION = «1553 
THE NEW RR MATRIX 

1.0380 - = 0137 = 18571 11988 "8624 

0137 20581 - =. 1.241 9730 0226 

8571 - 1241 2868 22376 - 04870 

1988 - .9730 - .2376 1.0380 - 7082 
- 5624 2C226 -  .0470 - 7082 20293 
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MULTIVARIATE ANALYSIS (CONTINUED) 2 19 += 1963 PAGE 13 
STEP 3 

BEST VARIABLE TO ADD WAS Y 2 

CHEAPEST VARIABLE TO ADD WAS Y 2 

VARIABLE ADDED WAS Y 2 


STATISTIC USED TO COMPARE WITH Flly 296) = 127.4672 
NEW MULTIPLE CORR COEFF SQUARED = .0205 
NEW CONDITIONAL VARIANCE = 41344 
COST TO OBSERVE Y 2 IN DOLLARS PER UNIT VARIANCE REDUCTION = 6.8394 
NEW SET OF VARIABLES IN REGRESSION 
1 2 4 
COEFFICIENTS B( 1) BO = 75.6177 
1.4258 22643 - 02792 -0000 .0000 
COST TO OBSERVE TEIS SET OF VARIABLES PER UNIT 
OF VARIANCE REDUCTION OF Y 5 
19629076 UNTTS CF OVARTANCE REDUCTION = 22154 
THE NEW RR MATRIX 
1yOu13 ‘e360 - ¥3 86h 285 *2 677 
» 2360 17.1986 - 2.1349 16.7347 -3895 
° 8864 2.1349 .0218 2.3150 .0012 
24285 16.7347 - 2.3150 17.3214 - -3292 
- 25677 - . 2895 20012 -3292 -0205 
3) LAST VARIABLE ADDED REDUCED THE CONDITIONAL VARIANCE 
OF Y 5 BY LESS THAN 2.0 
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Chapter IX 


CURRENT STUDIES AND PROPOSALS FOR FUTURE RESEARCH 


In this chapter we discuss fests that have been started using 
programs MV SIM and MV REGRESSION. Also, plans for future research 
are proposed. 

Some tests (described in Appendix 8) of a large number of 
samples generated by MV SIM have been completed. 

An empirical sampling study to study the random variables ine 
volved in the F test of chapter VI has been started since the form 
of the distribution is unknown and extremely difficult to obtain 
in closed form. Actually, p<! random variables, which we wil! call 


GC r000sG 2 are under study at the same time, They are defined by 


p 
a specified p variate normal, the size of each sample of the p 
variate normal, n, and the method of computing values of G;, 
i = l,eee,p-l, from a sample which is described next. 

At step one, CG, is defined as the maximum value of {601} where 
F is computed for each of the p=! variables (none of which are in 
regression yet). Go is dependent upon G; in the sense that Go is 
the value of max (6.1) computed after the variable for which F 
equals Gy has been entered into regression, Thus, at step fwo, max F 
is the maximum value of F for those pe-2 variables still not in regres- 
sion. The step-wise procedure continues without the use of any tests 
for halting so that a new variable ts added at each step. Thus, af 
step i, G; equals max F, where F is computed for each variable still 


not in regression by step i. After G, is recorded, the variable for 


which F = G- is entered into regression. 
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Since the values of [6.1] depend upon the sample size, we see 
that each sample of size n of a specified p variate normal produces 
one value of each of the random variables CpoccesG, po Also, to 
obtain repeated sets of values of the same random variables, the 
sample size must be kept constant. 

The tests that have been completed were performed on the five 
variate normal specified by formulas 4.1 and 4.2. Six sample sizes: 
50, 100, 150, 200, 250, and 300 have been computed 50 times each. 
The results of Gis Go» Gz, hy for the sample size 100 are plotted 
below in the form of estimated cumulative distribution functions 
(codef.'S). Where feasible, the graphs also show the curve of the 


Cadiifie Of F (Recall that if the F test of chapter VI had 


(1,n-q-1)° 


been applied, each value of G{,, would have been compared with 


q 
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So far the mind parameter test has not been implemented in 
program MV REGRESSION so the type of artifical control of the stepe 
wise process described in chapter VI! has not been tested. However, 
a number of samples of size 300 of an 18 x 18 matrix (the same matrix 
used in Appendix B) have been processed by MV REGRESSION, using 
rather wide limits on the halt criteria, After examination of the 


first run it was obvious that three variables in regression were 


.foo many and that either one or two would be the right number, 


Since the sample size was large, most samples allowed nine or more 
variables to enter regression on the basis of passing the F test 
even though nearly all of the variables beyond two reduced the 
estimated conditional variance of yyg by less than 1.0 unit. By 
comparison, the first variable usually reduced s)g from about 18.6 
to about 6.5. An examination of the computed statistics of all 
variables (whether in regression or not) made it apparent that some 
test such as the mind test might be quite useful here. 

Advantage was taken of the fact that the true p variate normal 
was known when samples obtained from it were being analyzed by 
MV REGRESSION. For example, after the first run on several samples 
of the 18 variate normal, only six of the !7 possible predictors 
ever got into regression by the third step. Hence, al! possible 
pairs of these six variables were fed back to MV SIM for which the 
true conditional variances of Yig Were computed. 

The various halt criteria suggested in chapter VII can be useful 
in developing methods of searching for optimal combinations of varie 


ables in regression. It is proposed that procedures, such as the 
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one described below, be tested and compared with procedures already 
described to see if better results can be obtained. 

We will assume that an experimenter has a large sample that 
perhaps was very expensive to obtain. We shal! permit the experi- 
menter two computer runs on the sample samples the first run proe 
viding a set of feed-back data for the second run. 

The main purpose of the first computer run is to determine ‘ 
lower bound on the conditional variance of Ype This is accomplished 
by using the F (and min F) test with the step-wise procedure with (X 
set to permit most variables to enter regression, Of course, at each 
step valuable information such as the conditional variance of yp» and 
the amounts of variance reduction due to each variable should be 
printed. 

From the first run the experimenter chooses the maximum number 
of variables, say m, that he will have in his final prediction equae 
tion. This is usually easy to do by examining the decreasing values 


=— 


of 3 where qy gq = my, represents the 


Sppel?® a Sop Tlistses 
number of variables in regression after the first computer run. 

The purpose of the second computer run is to make a rather 
thorough (but not exhaustive) search for the optimal combination of 
m variables in regression, The procedure is to conduct pe! separate 
regressions, each regression starting with a different first variable, 
and continuing until m variables are in regression, At each step 
(after the first), the variable chosen to enter regression will be 


the variable that can contribute most reduction in the conditional] 


variance of Yps unless, by adding this variable, a combination that 
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had been in regression previously (during a previous regression) 
would result, For example, if the first regression added variables 
in order y), Yo» Yos then if the second regression proceeded as 
Yoo Yos variable y,; would not be permitted to enter regression next, 
Instead, the second best variable would be chosen at this step. 

Thus, after the second computer run is completed the experie 
menter will have (p-1)xm prediction equations (and conditional 
variances of in) to choose from, p=! for each number of variables 
in regression. 

Two further investigations are proposed, In Appendix B, the 
results of tests of a number of samples of a five and an 1/8 variate 
normal are described, As a result of the failure of the sample VeC 
matrices, S, of the I8 variate normal to pass the chiesquare test, 
it is proposed that further testing of the multivariate normal 
generator be conducted. As indicated in Appendix B the possibility 
of round off error should be considered. 

It is also suggested that a study be made to ascertain which 
of the two suggested tests of the matrix S is better. Possibly a 
study would indicate weakness in both. Anderson GF section 10.8, 
describes a third test of matrix S, 

The step-wise procedure of regression analysis as described tn 
this paper is called the "forward” method because if starts with no 
variables in regression and adds them to regression one at a time, 
This is because the forward procedure permits computational shorte 
cuts so that the number of computations can be minimized (especial ly 


so when Efroymson's computer program algorithm is used [3])- The 
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backward operation of removing extraneous variables, however, offers 
no computational advantages. See Quenoui {le [8] » Another reason 
the forward procedure can be done with fewer computations is because, 
usually the number of variables in the final regression is much less 
than p-l. Often the reason for a large number of independent varie 
ables to be examined compared to the number finally used, is that 
from those variables actually measured additional variables are often 
created to account for possible curvilinearity and interaction. For 
example, if X, is a variable whose value was actually measured, 
variables Y = 6,5 Z= x? may be computed and used as part of the 
original pel possible predictors, [9]. see page 20. 

One possible advantage in using the backward method is to start 
the process by computing an estimate of the lowest possible value of 


the conditional variance of Yp? § |f somehow this value 


Ppols,ooosp=l° 
could be obtained before the forward procedure was performed, one 

could estimate the amount of reduction available in the combined com- 
bination of variables still not in regression at each step. Knowledge 
of this value at each step should be useful in deciding which way would 
be best to go next: i.eo, eliminate the weakest variables now in ree 


gression, or add the strongest variable still not in regression, or 


to halt. 
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Appendix A 


GENERATION OF THE P VARIATE NORMAL 
BY PROGRAM MV SIM 


For the construction of each sample (of size one) from the 
Specified p variate normal, MV SIM uses an independent sample of 
size p from the normal (0,1) distribution. (e.ge, meanu = 0, 
variance C= |). 

To obtain each independent norma! random sampie (of size one), 
MV SIM computes a function of an independent sample of size 12 from 
the uniform (0,1!) distribution. (e.ge, uniform on the interval zero 
to one). That this function only approximates normally distributed 
random numbers will be shown below. 

It follows from the above that to generate a sample of size n 
of a p variate normal, nxpx!l2 random numbers from the uniform (0,1) 
random number generator are required. 

A discussion of several techniques for generating uniformly 
distributed "pseudo" random numbers is given by Barron [2]. 
Empirical test procedures are also given. 

The particular uniform (0,1) pseudo random number generator 
used by MV SIM is a subroutine called RAND. RAND was programmed 
according to specifications given by Green, Bert F. Jre, Smith, Jo Eo, 
and Klem, Laura [5]. The number of initial random numbers, n in the 
reference, used by RAND is seven. This article also discusses a 
number of empirical tests that have been applied to this method. 


The method by which MV SIM uses 12 independent uniform (0,1!) 
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random numbers to compute each pseudo norma! (0,1!) random number 
is discussed by Vaa fio], see page 0, Briefly, each normal random 


number is computed as: 


le 


x, {QL 


j=l 


le 


where the Wj are the required independent sample of size 12 from 
fhe uniform (0,1) distribution. The variance of the uniform (0,1) 
distribution is one-twelfth and variances of independent, uniformly 
distributed random variables are additive under convolution, Hence 
it is convenient to select !2 as the number of uniform random vari- 
ables whose sum will approximate a normal variable. Means of (inde-= 
pendent) uniform variables are also additive so that it remains to 
subtract the constant six from the sums of !2 independent uni form 
(0,1) random variables to approximate the normal (0,1!) distribution, 
Vaa has a discussion of the advantages and disadvantages of this 
"truncated" approximation to the normal distribution. 

Wold [ii], pages xi to xiii, describes the method which MV SIM 
uses to convert an independent sample of size p from the normal 
(0,1!) distribution to a sample from a p variate normal specified by 
U and mM e This method requires the computation of a pxp triangular 
P matrix, P = {pi jhe from the original VeC matrix, ye so that the 


following matrix equation holdss 
yee ty 


For our discussion we arbitrarily choose the triangulation of 
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P= {Pi jf so that Py O when j = i, ieee, let all "upper diagonal” 


elements of P equal zero. Next, assuming x), 50005X is an independent 


p 
sample of size p from normal (0,1), the sample of size one of the p 


variate normal is computed ass 


yy =U, F Pay * 
Yo * Up Ft Poy *; * Poo Xo 


Ye Pie (PU wat? oP ppikap! 


where the u; are the elements of mean vector U, 

The term "pseudo" random number is customarily given to numbers 
generated by arithmetic means, see Barron [2]. pages 5, 6, of which 
the RAND subroutine is one. 

It is now clear that the samples of size n of the p variate 
normal generated by MV SIM, are themselves pseudo random numbers, 
since they are merely arithmetic functions of uniform pseudo random 
numbers. Perhaps in this context, the operation of this part of 
MV SIM might have been called "simulation" of a p variate normal, 
rather than "generation". To carry this process one step further, 
sample mean vector Z, and V=C matrix S, being arithmetic functions 
Of a sample of size n, are likewise pseudo random matrices, As in 
the case of the pseudo uniform and normal random numbers, it is 
desirable that some empirical tests be applied to these pairs of 
pseudo random matrices. 

Appendix B describes some tests in details one for vector Z, 


and one for matrix S. These tests are (optionally) performed by 


6, 





MV SIM on each sample, but MV SIM takes no corrective action except 


to print out the value of the computed statistics and an indication 


of the proper distribution to be compared with the statistics. 


The Sequential Operation of Program MV SIM is as fol lowss 


lo 


Co 


0 


De 


The 


6. 


Print out input mean vector U, and V=C matrix Des and 
other miscellaneous data identifying the computer run. 
Compute the P matrix from y as described above. Op= 
tionally, the P matrix may be printed out. 

List the variance of Yps Cope 
Compute the prediction equation for Yps for each combinae= 


tion of variables, Y jsccosy that are specified by the 


p= | 

program user as input. For each such regression the 

following data are printed: 

a) regression number 

b) tl variate normal, where q is the number of variables 
in regression 

c) multiple correlation coefficient (squared) 

d) conditional variance of Ys Cr ieee 4, 

e) the regression coefficients, J, (optional) 

Print out input data regarding samples of the specified 

distribution as described and illustrated in chapter vil] 

leo@o, Numbers of samples, observation costs, whether 

"standard" or "cost" option is used, etc. 

following eee rations are performed on each sample speci fied: 


Generate the required sample of the specified p variate 


normal. 
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7» Compute samples mean vector Z, and V=C matrix, S. Print 
out Z and S,. 

8. Test sample means, Z (optional) (see Appendix B). Print 
out eigenvectors and eigenvalues of matrix, S, from which 
the proper statistic is computed. Also print out the 
statistic and the proper degrees of freedom of F to be 
used for comparison, 

9. Test sample matrix, S (optional) (see Appendix B). 

Print out eigenvalues of sample matrix, S. Print out 
the statistic to be compared with chi-squared distri- 
bution. Also print out proper degrees of freedom to be 
used for comparison, 

Of course, the user of program MV SIM can omit some of the above 

items such as items 3 and at his discretion. 

The actual analysis of each sample and associated printed output 

performed by MV REGRESSION is described and illustrated in detail in 


chapter IX. 





Appendix B 


TESTS OF SAMPLE MEAN VECTOR, Z, 
AND SAMPLE VARIANCE=COVARIANCE MATRIX, S 


For a discussion of some of the problems encountered in generating 
random numbers by arithmetic means, see Barron [2| and Vaa jt9] . 
Graybill [ls] « page 206, shows that if Y is a p variate normal 


with mean vector U and VeC matrix za then the quanti tys 


sz ~ U) si Nee) GE) / (6 a, 


is distributed as F if indeed Z and S are computed from a 


(p,n=p)’ 
sample of size n from the specified p variate normal, Hence, to test 
a sample mean vector, Z, an appropriate level, CY, (usually .05) is 
chosen, Then if v is less than FOU (panep)” vector Z is accepted as 
having been computed from a reasonable sample; otherwise Z is rejected. 
To perform a test for a sample V-C matrix, S, an orthogonal 
transformation is performed on both yD and S, separately, yielding 
diagonal matrices A\ and D respectively. {\ is a VeC matrix of a 
p variate normal with independent variables (i.e., al! covariances 
are equal to zero). Now, if it is true that S is computed from a 
sample drawn from a p variate normal with V=C matrix, » » then D 
should be a sample drawn from ap variate normal with VeC matrix, DM 
Hence, a test that D is a sample from A\ should verify that S is a 
sample from . 
Since each element of D, si (i = l,ooosp)» is a sample variance, 
and since each element of aN, Cae the true variance corresponding 


to element Sie, for all i, intuitively, it appears that each of the 
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statistics: 


Ca) Cy (nani) = Nea se 


should have the chi-square distribution with n = | degrees of freedom 
(n is still the sample size). From this, and the fact that si is 
statistically independent of si forall i, j = Ipeosyp (Pj), If 


follows that the statistics: 
p 


(BI) (n = 1) ‘ (s/f, / e) 

has the chi-square distribution with pe(n=l) degrees of freedom, 
since the degrees of freedom of sums of independent chi-squares are 
additive. 

Hence, to test each sample VeC matrix, S, MV SIM "rotates" > 
and S, and computes formula Bl above from A and D, Printed out 
(optionally) are the p diagonal elements of A and D (the eigenvalues 
of matrices ), and S respectively). Also printed are the result of 
formula BI and the number of degrees of freedom of the chi-square 
distribution to be used for comparison. 

Programs MV SIM and MV REGRESSION were used to generate and test 
a number of samples from two different p variate normals. One of 
these normals is specified by 4.1! and 4.2 (five variate normal). The 
other distribution was an 18 variate normal that was very close to 
being singular. (Several sets of rows were close to each other in 
value). 

Six sample sizes: 50, 100, 150, 200, 250, and 4300 were studied 


of the five variate normal, with 20 samples tested of each size, 
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Four sample sizes: 50, 100, 150, and 200 were studied of the 
18 variate normal, with 20 samples tested of each size. 

For the five variate normal, both statistics (for Z and S) 
appeared to behave as samples from their respective F and chi-squared 
distributions for all sample sizes. 

However, curious results were obtained from the unusual 18 
variate normal tested. All Z tests passed as nicely as for the five 
variate normal. However, the values of chi-square were much too 
high, indicating poor sample V-C matrices, S, were being generated, 
For example, for the 20 samples of size 100 (of the 18 variate normal) 
the statistic Bl should behave as chi-square with 1782 degrees of 
freedom (which is the mean of that distribution). The 20 computed 
values of BI ranged from 2213 to 2683. 

A possible reason for these poor results could be due to the 
use Of a poor random number generator, However, the satisfactory 
results obtained from testing the five variate normal, as well as 
tests of the uniform random number generator conducted previously 
leads one to seek a different source of error, 

Possibly a more reasonable explanation is the likelihood of 
computer round off error, The large number of computations required 
to rotate an 18 X 18 matrix plus the fact that the matrices were al] 
nearly singular could very likely cause this type error. If this ts 
the case, the generated sample V-C matrices themselves may be "good" 
samples that are merely difficult to test. 

Another interesting possibility is the method used to rotate 


matrix S for the test. Recall that rotating a symmetric matrix, ye 





to yield a dlagonal matrix, A, can always be done by finding an 


orthogonal matrix, R,, so that the following is satisfied: 
T 
(B2) R.° Rae. 
Also since S is also symmetric Ry can be found so that 


T , e = 
Ro °S*Ro= D, 


where D is diagonal. Since 2. and S are not exactly equal it follows 
that orthogonal matrices R and Ro will not be equal. 
Perhaps one might argue that a "better" test might be to find 


Ry from the rotation of - B2 above, and then computes 


T / 
Rp SR, =z D 


where p’ should be nearly diagonal if S is a reasonable sample from 
ey then compare the diagonal! elements of p’ and A as described above 


for D and/\. 
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